Multiplication of Matrices of ArbitraryShape on a Data
نویسندگان
چکیده
Some level{2 and level{3 Distributed Basic Linear Algebra Subroutines (DBLAS) that have been implemented on the Connection Machine system CM{200 are described. No assumption is made on the shape or size of the operands. For matrix{matrix multiplication , both the nonsystolic and the systolic algorithms are outlined. A systolic algorithm that computes the product matrix in{place is described in detail. We show that a level{3 DBLAS yields better performance than a level{2 DBLAS. On the Connection Machine system CM{200, blocking yields a performance improvement by a factor of up to three over level{2 DBLAS. For certain matrix shapes the systolic algorithms ooer both improved performance and signiicantly reduced temporary storage requirements compared to the nonsystolic block algorithms. We show that, in order to minimize the communication time, an algorithm that leaves the largest operand matrix stationary should be chosen for matrix{matrix multiplication. Furthermore, it is shown both analytically and experimentally that the optimum shape of the processor array yields square stationary submatrices in each processor, i.e., the ratio between the length of the axes of the processing array must be the same as the ratio between the corresponding axes of the stationary matrix. The optimum processor array shape may yield a factor of ve performance enhancement for the multiplication of square matrices. For rectangular matrices a factor of 30 improvement was observed for an optimum processor array shape compared to a poorly chosen processor array shape.
منابع مشابه
A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure
The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...
متن کاملElectro-spunorganic nanofibers elaboration process investigations using BPs operational matrices
In this paper operational matrix of Bernstein Polynomials (BPs) is used to solve Bratu equation. This nonlinear equation appears in the particular elecotrospun nanofibers fabrication process framework. Elecotrospun organic nanofibers have been used for a large variety of filtration applications such as in non-woven and filtration industries. By using operational matrix of fractional integration...
متن کاملDistributed General Matrix Multiply and Add for a 2D Mesh Processor Network
A distributed algorithm with the same functionality as the single-processor level 3 BLAS operation GEMM, i.e., general matrix multiply and add, is presented. With the same functionality we mean the ability to perform GEMM operations on arbitrary subarrays of the matrices involved. The logical network is a 2D square mesh with torus connec-tivity. The matrices involved are distributed with non-sc...
متن کاملA note on primary-like submodules of multiplication modules
Primary-like and weakly primary-like submodules are two new generalizations of primary ideals from rings to modules. In fact, the class of primary-like submodules of a module lie between primary submodules and weakly primary-like submodules properly. In this note, we show that these three classes coincide when their elements are submodules of a multiplication module and satisfy the primeful pr...
متن کاملData confidentiality in cloud-based pervasive system
Data condentiality and privacy is a serious concern in pervasive systems where cloud computing is used to process huge amount of data such as matrix multiplications typically used in HPC. Due to limited processing capabilities, smart devices need to rely on cloud servers for heavy-duty computations such as matrix multiplication. Conventional security mechanisms such as public key encryption is...
متن کامل